Skip to content

fix(security): datamarking substitutes zero-width formatting characters (closes #215)#231

Merged
epappas merged 1 commit into
mainfrom
fix/is-060-datamarking-zwsp
May 17, 2026
Merged

fix(security): datamarking substitutes zero-width formatting characters (closes #215)#231
epappas merged 1 commit into
mainfrom
fix/is-060-datamarking-zwsp

Conversation

@epappas

@epappas epappas commented May 17, 2026

Copy link
Copy Markdown
Collaborator

Summary

Closes #215. Follow-up to PR #214 (IS-060 PR-2 datamarking transform).

The gap

PR #214 used char::is_whitespace() as the datamarking classifier — per the PR-2 brief. Rust's is_whitespace follows the Unicode White_Space property, which excludes the zero-width / formatting codepoints commonly used as invisible prompt-injection vectors:

  • ZWSP U+200B (ZERO WIDTH SPACE)
  • ZWNJ U+200C
  • ZWJ U+200D
  • WJ U+2060 (WORD JOINER)
  • BOM U+FEFF

These passed through the Data-zone transform unchanged, allowing an attacker to smuggle invisible instructions inside an otherwise-marked Data zone.

The fix

Introduce a small predicate that augments char::is_whitespace with the five zero-width codepoints, and route the substitution loop through it:

pub fn is_substitutable_whitespace(c: char) -> bool {
    c.is_whitespace()
        || matches!(c, '\u{200B}' | '\u{200C}' | '\u{200D}' | '\u{2060}' | '\u{FEFF}')
}

This is Option 1 from issue #215 — the minimal-risk path. The Unicode whitespace surface is unchanged; we only widen the predicate by exactly the five zero-width codepoints called out as attack vectors. No public type or marker-selection logic changes.

Threat-model rationale

The Spotlighting datamarking guarantee is "any whitespace-equivalent gap in a Data zone is replaced by an out-of-band PUA marker so the model can distinguish data from instructions." If invisible-character smuggling can introduce gaps the predicate does not see, the guarantee is broken — the attacker can structure invisible-character "words" the model still parses as instructions. Treating zero-width formatting codepoints as substitutable whitespace closes that hole without expanding the marker contract.

Testing

  • zwsp_is_not_substituted_by_design renamed/inverted to zwsp_is_substituted — now asserts ZWSP IS replaced (and validates the byte_delta: ZWSP and U+E000 are both 3 bytes UTF-8 → delta 0).
  • New per-codepoint coverage: zwnj_is_substituted, zwj_is_substituted, word_joiner_is_substituted, bom_is_substituted.
  • mixed_whitespace_classes_all_substituted retained — the existing Unicode White_Space set (space, tab, newline, NBSP, VT, FF) still substitutes unchanged.
  • idempotence_apply_twice_equals_apply_once extended with a mixed input containing both ordinary whitespace and zero-width codepoints ("hello world\nfoo\u{200B}bar\u{FEFF}baz"); second pass remains a no-op with zero byte_delta.

Verification

  • cargo fmt --all --check clean
  • cargo clippy --workspace -- -D warnings clean
  • cargo test -p llmtrace-security 595/595 pass
  • All 18 datamarking unit tests pass (5 new zero-width tests + 13 existing)

Note: a workspace-wide cargo test --workspace flagged llmtrace-proxy::tests::test_debug_verdicts_returns_404_when_flag_off as failing under parallel contention (502 vs expected 404), but the same test passes in isolation. That test does not touch datamarking and the failure reproduces on main under the same parallel conditions — it is unrelated to this change.

Test plan

Closes #215.

…rs (closes #215)

PR #214 used `char::is_whitespace()` as the datamarking classifier. That
predicate follows the Unicode `White_Space` property, which excludes
zero-width formatting codepoints (ZWSP `U+200B`, ZWNJ `U+200C`,
ZWJ `U+200D`, WJ `U+2060`, BOM `U+FEFF`). Those codepoints are
documented prompt-injection vectors used to smuggle invisible
instructions inside otherwise-benign Data zones, so they were
passing through the transform unchanged.

Add `is_substitutable_whitespace(c)` = `c.is_whitespace()` plus the
five zero-width codepoints, and use it in the substitution loop in
place of the bare `char::is_whitespace` call.

Tests:
- `zwsp_is_not_substituted_by_design` renamed/inverted to
  `zwsp_is_substituted` (now asserts ZWSP IS replaced).
- New per-codepoint coverage: zwnj/zwj/word_joiner/bom.
- `mixed_whitespace_classes_all_substituted` left unchanged
  (still validates the Unicode `White_Space` set).
- `idempotence_apply_twice_equals_apply_once` extended with a
  mixed ordinary + zero-width input to exercise both classifier
  branches; second pass remains a no-op.
@epappas epappas merged commit 738da08 into main May 17, 2026
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

is-060: datamarking does not substitute ZWSP (U+200B) — follow-up to PR #214

1 participant